Intro to R

R Coding Club
RTG 2660

Dr. Lea Hildebrandt

2024-01-16

R Coding Club?!?

What is a coding club?

“Coding Club is for everyone, regardless of their career stage or current level of knowledge. Coding Club is a place that brings people together, regardless of their gender or background. We all have the right to learn, and we believe learning is more fun and efficient when we help each other along the way.” (https://ourcodingclub.github.io/)

  • Code Review: Someone brings a script, we go through it and learn from it/improve it.

    • Coding “solutions”: Someone presents something neat they implemented with code.
  • Coding “problems”: Someone brings a problem and we try to solve it together.

  • Data Dojo: We take a dataset (from one of us or an openly available one, e.g. tidy tuesday) and work with it collaboratively. After deciding what to do with it, two people start: One is the driver (who live codes) and one the co-pilot (who helps and instructs). After 5 min. the co-pilot switches to driver and a new co-pilot steps up.

Schedule (preliminary)

How often do we want to meet? (Beginning weekly, maybe later bi-weekly?)

Semester break?

Session Date Topic Preparation/Notes
1 16.01.24 Intro to R & RStudio Install R and RStudio, download Dataset1, Dataset2
2 23.01.24 Data Wrangling
3 30.01.24 Randomization & Counterbalancing (Ecem)
4 06.02.24 Data Visualization
5 13.02.24 Preprocessing, e.g. physio data (Francesco)
(Semester break or continue?)
6 20.02.24 No R Club!

Why write code?

  • Doing statistical calculation by hand? Tedious & error prone! Computer is faster…

  • Using spreadsheets? Limited options, change data accidentally…

  • Using point-and-click software (e.g. SPSS)?

    • proprietary software = expensive

    • R = open, extensible (community)

    • reproducible!

  • You’ll learn to program!

Install R & RStudio

You should all have installed both by now! Who had problems doing so?

Overview RStudio

Screenshot of the RStudio Interface with different panes visible

RStudio Interface

RStudio Panes

  1. Script pane/window -> to save your code

  2. Console -> here the commands are run

  3. Environment -> which variables/dataframes are saved

  4. Files, plots, help etc. -> files shows you the files in the folder you’re currently in

RStudio Interface

Using the Console as a Calculator

100 + 1
[1] 101
2*3
[1] 6
sqrt(9)
[1] 3
Calculated 100+1, 2*3, square root of 9 directly in the console

Console used as calculator

Saving the Results as a Variable/Object

a <- 100 + 1

multi <- 2*3

SqrtOfNine <- sqrt(9)

word <- "Hello"
  • “<-” is used to assign values to variables (“=” is also possible but not preferred)

    • Hint: use Alt + - as a shortcut
  • a, multi etc. are the variable names, which can be words, no whitespace allowed

    • You can find those now in your Environment!
  • as you can see, the variables can contain different types: Numbers, strings/characters (= words) etc.

  • no output in console!

  • the variables contain the calculated value (i.e. 101) and not the calculation/formula (100+1)

  • You can use those variables for further calculations, e.g. a + multi

Working Directory

It makes sense to save all your scripts etc. in a folder specifically dedicated to this course.

  • Make sure that R knows that you want to work in this folder, i.e. set your working directory:

    • Session -> Set Working Directory -> Choose Directory
  • Assignment: Please make a folder, e.g. called “R_Club” (but not “R” or anything with spaces in it). Then set your working directory to this folder.

Scripts

If you type in all your commands/code in the console, it might get lost/you might not remember what you did, and you always have to type it in again if you want to run it again with slight changes. Also, the code in the console is not save-able.

Therefore, it is better practice to write scripts. Scripts are basically text files that contain your code.

To open a new script, click File -> New File -> R Script.

To run a line of the script, you can either click Run at the top right of the pane or press ctrl + enter. It will always run the line where the cursor is located (or the lines that you have selected with the mouse). To run the whole script, press ctrl + shift + enter.

Scripts 2

Assignment: Open a new file. In this file, write down some of the code (one command per line) that we have used so far and save the file.

Now run the code (either by pressing “run” at the top right of the script or ctrl + enter).

Functions

You might have noticed sqrt(9) earlier. sqrt() is an R function that calculates the square root of a number. 9 is the argument that we hand over to the function.

If you want to know what a function does, which arguments it takes, or which output it generates, you can type ?functionname() in the console, e.g.

?sqrt()

This will open the help file in the Help Pane on the lower right of RStudio.

Functions 2

Functions often take more than one argument:

rnorm(n = 6, mean = 3, sd = 1)
# rnorm(6, 3, 1)

# By the way, # denotes a comment (ignored by R), which can be helpful for scripting!

You can explicitly state which argument you are handing over (check the help file for the argument names!) or just state the values (but these have to be in the correct order then! See help file).

Packages

There are a number of functions that are already included with Base R, but you can greatly extend the power of R by loading packages. Packages are like libraries of functions that someone else wrote.

You can load a package using the install.packages() function:

install.packages("tidyverse")

(It may be necessary to install Rtools: https://cran.r-project.org/bin/windows/Rtools/)

But installing is not enough to be able to actually use the functions from that package. You’d also need to load the package with the libary() function:

library("tidyverse") # or library(tidyverse)

Assignment: Install and load the tidyverse package (which we will use a lot in this course).

Working with Data

Read in Data

To read in data files, you need to know which format these files have, e.g. .txt. or .csv files or some other (proprietary) format. There are packages that enable you to read in data of different formats.

We will use the files from Fundamentals of Quantitative Analysis. Save these in your course folder on your computer (do not open them!). Set your working directory to the course folder.

Delete the text/code in the R Script you just worked on (and add a comment as a header: “Working with data”). Underneath, add the following code:

# library(tidyverse) # we will use a function from the tidyverse to read in the data

dat <- read_csv("ahi-cesd.csv")
pinfo <- read_csv("participant-info.csv")

Run the code!

Looking at the Data

There are several options to get a glimpse at the data:

  • Click on the object/variable name in your Environment.

  • Type View(NameOfObject) in your console, e.g. View(dat).

  • In the console, type in str(dat) or str(pinfo) to get an overview of the data.

  • In the console, type in summary(dat).

  • In the console, type in head(dat).

  • What is the difference between these commands?

Looking at the Data 2

What is the difference to the objects/variables, that you assigned/saved in your Environment earlier and these objects?

The two objects we just read in are data frames, which consists of full datasets. The objects we assigned earlier were simpler variables, which only consisted of single values/words.

Data frames usually have several rows and columns. Remember, the columns are the variables and the rows are the observations.

Other Script/Project Options

  • R Markdown

  • Quarto

  • Projects

R Markdown

R scripts are a good way to save your code. However, you’d better heavily comment in your scripts, so that future you (and potentially collaborators) know what happens in your script.

An alternative is an R Markdown file. This is also a sort of script, but you can write text (like in a word processor) and mix it with code chunks, where you can write your R code. R Markdown is the “language” you use to write in these files, which is a variety of Markdown.

The advantage of R Markdown files (ending with .Rmd) is that they increase reproducibility. For example, you can write whole reports in R Markdown (and also these slides are made with it!).

A newer variant is called quarto, which works very similar (but is more flexible) to R Markdown.

R Markdown 2

R script

R Markdown

R Markdown rendered as html report

R Markdown 3

Assignment:

  • Open a new .Rmd file, change/insert the title and author.

  • Check out the content of it.

  • Delete and add some of the text on the white background. Change the Header (indicated by ##) to “About me” and write something about yourself underneath.

  • Switch between “Source” and “Visual” in the top left. What changes? What is “Visual”?

  • In the grey boxes (“code chunks”), add some code. Try to find out how you can add a new code chunk.

    • hint: The green C with the + on the top right will do so (or using “insert” in the visual view)
  • Save the file with a sensible name.

  • What happens when you click on “Knit” (top of Script pane)?

    • Click on the little arrow next to knit and select “Knit to PDF”
  • insert inline code

R Markdown 4

There are many useful things you can so with R Markdown: Adding different headers, adding inline code, knitting as a PDF, adding pictures or tables…You can also decide whether the code chunks should be visible in the output etc.

For further information, check out the R Markdown cheatsheet: https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf

Thanks!

That’s the lesson on “Getting started with R”!

Next week, we’ll talk about models & probability and learn how to wrangle (= preprocess) data in R!